Gulf of Mexico
Gaussian processes at the Helm(holtz): A more fluid model for ocean currents
Berlinghieri, Renato, Trippe, Brian L., Burt, David R., Giordano, Ryan, Srinivasan, Kaushik, Özgökmen, Tamay, Xia, Junfei, Broderick, Tamara
Given sparse observations of buoy velocities, oceanographers are interested in reconstructing ocean currents away from the buoys and identifying divergences in a current vector field. As a first and modular step, we focus on the time-stationary case - for instance, by restricting to short time periods. Since we expect current velocity to be a continuous but highly non-linear function of spatial location, Gaussian processes (GPs) offer an attractive model. But we show that applying a GP with a standard stationary kernel directly to buoy data can struggle at both current reconstruction and divergence identification, due to some physically unrealistic prior assumptions. To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate the benefits of our method with theory and experiments on synthetic and real ocean data.
MPMQA: Multimodal Question Answering on Product Manuals
Zhang, Liang, Hu, Anwen, Zhang, Jing, Hu, Shuo, Jin, Qin
Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the model not only to process multimodal contents but also to provide multimodal answers. To support MPMQA, a large-scale dataset PM209 is constructed with human annotations, which contains 209 product manuals from 27 well-known consumer electronic brands. Human annotations include 6 types of semantic regions for manual contents and 22,021 pairs of question and answer. Especially, each answer consists of a textual sentence and related visual regions from manuals. Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers. We further propose a unified model that can perform these two subtasks all together and achieve comparable performance with multiple task-specific models. The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA.
Supervised ensemble classification of Kepler variable stars
Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. We present an expansion of existing work on the field by analysing variable stars in the Kepler field using an ensemble approach, combining multiple characterization and classification techniques to produce improved classification rates. Classifications for each of the roughly 150 000 stars observed by Kepler are produced separating the stars into one of 14 variable star classes. The study of variable stars has provided a wealth of valuable astrophysical information. Intrinsic sources of variation, such as in pulsation, provide a physical probe and test for our understanding of stellar atmospheres and interiors.
Domain Adaptive Hand Keypoint and Pixel Localization in the Wild
Ohkawa, Takehiko, Li, Yu-Jhe, Fu, Qichen, Furuta, Ryosuke, Kitani, Kris M., Sato, Yoichi
We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e.g., outdoors) when we only have labeled images taken under very different conditions (e.g., indoors). In the real world, it is important that the model trained for both tasks works under various imaging conditions. However, their variation covered by existing labeled hand datasets is limited. Thus, it is necessary to adapt the model trained on the labeled images (source) to unlabeled images (target) with unseen imaging conditions. While self-training domain adaptation methods (i.e., learning from the unlabeled target images in a self-supervised manner) have been developed for both tasks, their training may degrade performance when the predictions on the target images are noisy. To avoid this, it is crucial to assign a low importance (confidence) weight to the noisy predictions during self-training. In this paper, we propose to utilize the divergence of two predictions to estimate the confidence of the target image for both tasks. These predictions are given from two separate networks, and their divergence helps identify the noisy predictions. To integrate our proposed confidence estimation into self-training, we propose a teacher-student framework where the two networks (teachers) provide supervision to a network (student) for self-training, and the teachers are learned from the student by knowledge distillation. Our experiments show its superiority over state-of-the-art methods in adaptation settings with different lighting, grasping objects, backgrounds, and camera viewpoints. Our method improves by 4% the multi-task score on HO3D compared to the latest adversarial adaptation method. We also validate our method on Ego4D, egocentric videos with rapid changes in imaging conditions outdoors.
Machine learning-based porosity estimation from spectral decomposed seismic data
Jo, Honggeun, Cho, Yongchae, Pyrcz, Michael J., Tang, Hewei, Fu, Pengcheng
Estimating porosity models via seismic data is challenging due to the signal noise and insufficient resolution of seismic data. Although impedance inversion is often used by combining with well logs, several hurdles remain to retrieve sub-seismic scale porosity. As an alternative, we propose a machine learning-based workflow to convert seismic data to porosity models. A ResUNet++ based workflow is designed to take three seismic data in different frequencies (i.e., decomposed seismic data) and estimate their corresponding porosity model. The workflow is successfully demonstrated in the 3D channelized reservoir to estimate the porosity model with more than 0.9 in R2 score for training and validating data. Moreover, the application is extended for a stress test by adding signal noise to the seismic data, and the workflow results show a robust estimation even with 5\% of noise. Another two ResUNet++ are trained to take either the lowest or highest resolution seismic data only to estimate the porosity model, but they show under- and over-fitting results, supporting the importance of using decomposed seismic data in porosity estimation.
FastSecAgg: Scalable Secure Aggregation for Privacy-Preserving Federated Learning
Kadhe, Swanand, Rajaraman, Nived, Koyluoglu, O. Ozan, Ramchandran, Kannan
Recent attacks on federated learning demonstrate that keeping the training data on clients' devices does not provide sufficient privacy, as the model parameters shared by clients can leak information about their training data. A 'secure aggregation' protocol enables the server to aggregate clients' models in a privacy-preserving manner. However, existing secure aggregation protocols incur high computation/communication costs, especially when the number of model parameters is larger than the number of clients participating in an iteration -- a typical scenario in federated learning. In this paper, we propose a secure aggregation protocol, FastSecAgg, that is efficient in terms of computation and communication, and robust to client dropouts. The main building block of FastSecAgg is a novel multi-secret sharing scheme, FastShare, based on the Fast Fourier Transform (FFT), which may be of independent interest. FastShare is information-theoretically secure, and achieves a trade-off between the number of secrets, privacy threshold, and dropout tolerance. Riding on the capabilities of FastShare, we prove that FastSecAgg is (i) secure against the server colluding with 'any' subset of some constant fraction (e.g. $\sim10\%$) of the clients in the honest-but-curious setting; and (ii) tolerates dropouts of a 'random' subset of some constant fraction (e.g. $\sim10\%$) of the clients. FastSecAgg achieves significantly smaller computation cost than existing schemes while achieving the same (orderwise) communication cost. In addition, it guarantees security against adaptive adversaries, which can perform client corruptions dynamically during the execution of the protocol.
Cooperative Lane Changing via Deep Reinforcement Learning
Wang, Guan, Hu, Jianming, Li, Zhiheng, Li, Li
In this paper, we study how to learn an appropriate lane changing strategy for autonomous vehicles by using deep reinforcement learning. We show that the reward of the system should consider the overall traffic efficiency instead of the travel efficiency of an individual vehicle. In summary, cooperation leads to a more harmonic and efficient traffic system rather than competition
Raspberry-picking MACHINES will replace dwindling numbers of migrant farm workers
Hours spent toiling away under the beating sun to harvest berries and fruit may soon be a thing of the past as robots look set to replace humans in the field. A £700,000 machine built by the University of Plymouth has succeeded in plucking a raspberry from a plant and carefully placing it in a punnet. The painstaking process takes a whole minute to get one berry because it requires a combination of soft robotics, clever AI and'deep learning'. It stands around six foot tall (1.8metres) and will combat a continued drop in the amount of migrant farm workers available for the arduous harvests. Fieldwork Robotics, a spin-off from the university dedicated to agricultural robots, built the machine and says it will be able to pick 25,000 fruits a day in the future.
M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search
Shen, Yelong, Chen, Jianshu, Huang, Po-Sen, Guo, Yuqing, Gao, Jianfeng
Learning to walk over a graph towards a target node for a given query and a source node is an important problem in applications such as knowledge base completion (KBC). It can be formulated as a reinforcement learning (RL) problem with a known state transition model. To overcome the challenge of sparse rewards, we develop a graph-walking agent called M-Walk, which consists of a deep recurrent neural network (RNN) and Monte Carlo Tree Search (MCTS). The RNN encodes the state (i.e., history of the walked path) and maps it separately to a policy and Q-values. In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards. From these trajectories, the network is improved in an off-policy manner using Q-learning, which modifies the RNN policy via parameter sharing. Our proposed RL algorithm repeatedly applies this policy-improvement step to learn the model. At test time, MCTS is combined with the neural policy to predict the target node. Experimental results on several graph-walking benchmarks show that M-Walk is able to learn better policies than other RL-based methods, which are mainly based on policy gradients. M-Walk also outperforms traditional KBC baselines.
Recurrent Transformer Networks for Semantic Correspondence
Kim, Seungryong, Lin, Stephen, JEON, SANG RYUL, Min, Dongbo, Sohn, Kwanghoon
Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations. By directly estimating the transformations between an image pair, rather than employing spatial transformer networks to independently normalize each individual image, we show that greater accuracy can be achieved. This process is conducted in a recursive manner to refine both the transformation estimates and the feature representations. In addition, a technique is presented for weakly-supervised training of RTNs that is based on a proposed classification loss. With RTNs, state-of-the-art performance is attained on several benchmarks for semantic correspondence.